-
Notifications
You must be signed in to change notification settings - Fork 12.7k
Fix GPT-OSS Harmony format end token handling crash #15195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The GPT-OSS model uses OpenAI's Harmony format which includes an <|end|> token after the final message. The parser wasn't handling this token properly, causing finish() to throw 'Unexpected content at end of input'. Modified the GPT-OSS parser to strip the <|end|> token from the content if present. Added comprehensive tests for GPT-OSS format with and without the end token to ensure backward compatibility.
… by unconsumed <|end|> tokens in the chat parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Fix server crash when using GPT-OSS models with tools that was caused by unconsumed <|end|>
tokens in the chat parser.
- Modified GPT-OSS chat parser to properly handle
<|end|>
tokens in both tool and non-tool scenarios - Added reasoning format parameter passing from templates to server configuration
- Added comprehensive test coverage for GPT-OSS Harmony format parsing edge cases
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
common/chat.h | Added reasoning_format field to common_chat_params struct for template-specific configuration |
common/chat.cpp | Fixed GPT-OSS parser to consume remaining content and handle `< |
tools/server/utils.hpp | Added reasoning_format parameter passing from chat templates to server parameters |
tools/server/server.cpp | Updated server to use template-provided reasoning format with proper fallback logic |
tests/test-chat-parser.cpp | Added comprehensive test suite for GPT-OSS Harmony format edge cases and token handling |
common/chat.cpp
Outdated
} else { | ||
// No <|end|> token, consume everything remaining | ||
if (!builder.syntax().parse_tool_calls) { | ||
builder.add_content(builder.consume_rest()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is unreachable because it's inside an else block that follows a return statement that was removed. The logic flow suggests this should be part of the else clause for the if (!builder.syntax().parse_tool_calls) condition.
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made changes that address this in subsequent commits.
I see and I am merging the changes from #15181 against my own in another branch. |
Summary
Fix server crash when using GPT-OSS models with tools that was caused by unconsumed
<|end|>
tokens in the chat parser.Changes
Module:
chat-parser
common_chat_parse_gpt_oss()
only consumed content whenparse_tool_calls=false
, leaving<|end|>
token unconsumed whenparse_tool_calls=true
(tools enabled)parse_tool_calls
casesManual Validation
Before Fix:
After Fix: